Supplementary Material: Reinforced Video Captioning with Entailment Rewards

نویسندگان

  • Ramakanth Pasunuru
  • Mohit Bansal
چکیده

Our attention baseline model is similar to the Bahdanau et al. (2015) architecture, where we encode input frame level video features to a bi-directional LSTM-RNN and then generate the caption using a single layer LSTM-RNN, with an attention mechanism. Let {f1, f2, ..., fn} be the frame-level features of a video clip and {w1, w2, ..., wm} be the sequence of words forming a caption. The distribution of words at time step t given the previously generated words and input video frame-level features is given as follows:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforced Video Captioning with Entailment Rewards

Sequence-to-sequence models have shown promising improvements on the temporal task of video captioning, but they optimize word-level cross-entropy loss during training. First, using policy gradient and mixed-loss methods for reinforcement learning, we directly optimize sentence-level task-based metrics (as rewards), achieving significant improvements over the baseline, based on both automatic m...

متن کامل

Supplementary Material: Multi-Task Video Captioning with Video and Entailment Generation

1.1.1 Video Captioning Datasets YouTube2Text or MSVD The Microsoft Research Video Description Corpus (MSVD) or YouTube2Text (Chen and Dolan, 2011) is used for our primary video captioning experiments. It has 1970 YouTube videos in the wild with many diverse captions in multiple languages for each video. Caption annotations to these videos are collected using Amazon Mechanical Turk (AMT). All ou...

متن کامل

Multi-Task Video Captioning with Video and Entailment Generation

Video captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a challenge, especially given the lack of sufficient annotated data. We improve video captioning by sharing knowledge with two related directed-generati...

متن کامل

End-to-End Video Captioning with Multitask Reinforcement Learning

Although end-to-end (E2E) learning has led to promising performance on a variety of tasks, it is often impeded by hardware constraints (e.g., GPU memories) and is prone to overfitting. When it comes to video captioning, one of the most challenging benchmark tasks in computer vision and machine learning, those limitations of E2E learning are especially amplified by the fact that both the input v...

متن کامل

Durable Glass Fiber Reinforced Concrete with Supplimentary Cementitious Materials

Durability of concrete structure in marine environments is a big issue for many decades due to chloride attack. Chloride penetrates the concrete structure and accelerates the corrosion process of reinforcement which decreases the life of those structures. Also shrinkage cracks in concrete play main role for chloride penetration through concrete surface.  Many researchers tried to find easy and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017